home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Business Master (3rd Edition)
/
The Business Master (3rd Edition).iso
/
files
/
spreotus
/
123pwrwk
/
sregr.doc
< prev
next >
Wrap
Text File
|
1984-08-22
|
33KB
|
727 lines
SREGR - Simple Linear Regression
1. INTRODUCTION
Linear regression is a method of finding the "best" straight
line through a set of points. Each point is described by two
values, and if there is a linear relationship between the values,
the points will lie close to a straight line when plotted on a
graph. For example, we might look for such a relationship
between:
- sales of the IBM PC and sales of the PC version of Lotus
1-2-3
- sales of Lotus 1-2-3 and of books about 1-2-3
- prime interest rate and business bankruptcies
In each of the above examples, we might be interested in
estimating the value of the second item (Y, the dependent var-
iable) given a value for the first item (X, the independent
variable). The SREGR worksheet will calculate the equation of the
best-fit straight line through pairs of X-Y values, and will
calculate estimates of Y, given X values, from the line. This
worksheet is a subset of the LREGR worksheet (contained on the
Volume 1B disk). It does not calculate any confidence limits, or
the significance of the Student's t value for the regression. As
a result, it can accept more pairs of input values for a
particular main memory configuration, and it calculates faster
than LREGR.
The worksheet accepts pairs of values (labelled Y and X), and
calculates the values of a and b in the equation Y = a + b X of
the best-fit straight line. a is often called the intercept,
because it is the point at which the line intercepts the Y-axis on
a graph, and b is the slope.
In addition to calculating the intercept and slope, the
worksheet calculates two of the parameters a statistician uses to
appraise the quality of the fit. These are:
- the correlation coefficient (r), which measures the degree
of association between X and Y. The value of r always lies
between 1 and -1. A value of zero indicates no
relationship; a value of +1 or -1 indicates a perfect
relationship. The correlation coefficient is negative when
larger values of X are associated with smaller values of Y.
- the value of Student's t for the hypothesis that the slope
(b) is zero (ie. changing the value of X doesn't have a
significant effect on the value of Y). The larger the value
of t, the less we are inclined to accept the hypothesis.
Note the SREGR worksheet does not calculate the significance
of this t-value in probability terms (but you can look it up
in standard statistical tables).
08/21/84 SREGR - 1
Statistical tables exist for the correlation coefficient and
Student's t, which give the probability that such a value (or
better) could be obtained purely by chance from random data.
For the purpose of estimating the coefficients (a and b) of
the regression equation, the worksheet uses all X-values for which
Y-values are supplied. These X-Y pairs must be entered in
consecutive rows, beginning at the top of the input area.
However, you may enter more X-values below the ones paired with
Y-values. The worksheet estimates the Y-values corresponding to
all the X-values you enter. This is useful if you want to use the
regression line as a forecasting pr estimating tool.
2. USING SREGR
The main menu of SREGR offers you seven choices (plus Quit).
They are as follows:
Input
This choice allows you to enter data to feed the
calculations, and to change the limit on the number of data
items you may input. There are two types of data: the raw
X-Y pairs from which the regression line is derived, and
additional X values for which Y estimates are calculated.
Note that changing the maximum number of data items involves
erasing all input data. Therefore, if you think you might
need more space than the worksheet initially allows (300
pairs), you should change the limit before you enter any
data.
Blank
You should choose this option if you have finished with the
data in the worksheet and want to start again with an empty
worksheet. Note that this option wipes out all calculated
results as well as the raw data. (You must confirm this
choice to reduce the risk of accidental erasure.)
Calc
Make this choice when you want to calculate or recalculate
the results. If you have changed data or entered more data
since you last calculated, you must recalculate before
displaying new results or graphs.
Results
This option guides you through the various results areas in
the worksheet. The main results are all shown in the home
screen. But there is a screen full of intermediate results
(averages and sums of squares, for example). And alongside
the columns in which you entered the raw data are calculated
Y estimates. To print your results, select Print from the
main menu (see below).
Graph
One of the clearest ways to show a linear regression is
08/21/84 SREGR - 2
though a graph. This worksheet offers one graph - a simple
one showing the original data points and the regression line
through them.
Print
The print option allows you to send selected areas of the
worksheet to your printer. You can select two different
areas: the final and intermediate results areas (as one
page), and the raw data and Y estimates.
Agenda
If you select this option (and confirm it when requested),
the current worksheet is lost and replaced by the main
agenda. Then you can load another worksheet from the disk.
Note confirmation is required before this option proceeds,
because it will overwrite the worksheet in memory.
2.1 Input Details
The input phase allows you to enter data in a variety of
ways. The input menu has five choices (plus Quit).
The first and second choices (XYPairs and YXPairs) are very
similar. If your source data is arranged as two columns of
numbers, these choices will handle all pointer movement as you
enter your data row-by-row. If the X column is on the left,
select XYPairs; if the Y column is on the left, select YXPairs.
Then enter your data one row at a time. Type the number on the
left first, then the number on the right. After each number,
press the ENTER key. Note that in the worksheet the Y column is
always on the left. Selecting XYPairs or YXPairs merely
determines which column receives the first number of each pair you
type. These two options (and Est-X) are convenient if you want to
use the numeric pad on your keyboard to enter the numbers, because
they perform all the necessary cursor movement for you. You can
press the Num Lock key to disable the cursor movement keys, and
use the numeric pad for numbers. (Remember to press the Num Lock
key again when you've finished entering your data, to re-activate
the cursor movement keys.) When you have entered all your data,
hold down the Ctrl key and press Break. This will stop the macro
and exit to 1-2-3 READY mode. Then hold down the Alt key and
press I to return to the input menu. If you choose either XYPairs
or YXPairs when you have already entered some data, the new data
will be added to the bottom of the existing pairs (but overlaying
any unpaired X values for which Y estimates were required). To
get rid of existing pairs, you must use the Blank option from the
main menu.
The third choice is Est-X. It allows you to add a column of
unpaired X values below the existing data. The worksheet will
calculate a Y estimate for each unpaired X value you enter. Type
each value for X, and press the ENTER key. When you have entered
all the values, hold down the Ctrl key and press Break. This will
stop the macro and exit to 1-2-3 READY mode. Then hold down the
Alt key and press I to return to the input menu.
08/21/84 SREGR - 3
The fourth choice uses the standard 1-2-3 Range Input facil
ity. In some respects this is convenient, because it confines the
cell pointer to the unprotected cells in the X and Y input
columns. And pressing the Esc key returns you directly to the
input menu. However, you are responsible for all cursor movement
within the input area. So you cannot easily use the numeric pad
to enter your numbers with this option. Remember to enter Y
values on the left (column B), and X values on the right (column
C).
The final choice (except Quit) in this menu is Number. It
allows you to set the maximum number of input values. When the
SREGR worksheet is first loaded, it is limited to 300 pairs of
input values. If your machine has less than 256K of memory, you
will not be able to enter even this many pairs without a 'Memory
Full' error. However, if your machine has more memory, you may
use this option to increase the amount of data you can input. You
must confirm your intention to use this option, because it will
erase any data in the input area.
There will come a time when you have entered data into the
worksheet, but you need to re-size the input area (for example,
because you underestimated the amount of data). When you re-size
the area, the worksheet will erase your input. But you can use
the /File Xtract option of 1-2-3 to save the data you have entered
and avoid re-typing it. Here's how.
To save the input data, first exit to 1-2-3 READY mode,
then press the Goto key (F5), type the word 'input', and
press the ENTER key. You will find yourself at the top left
corner of the input area. Then type the following 1-2-3
command (initial letters only):
/File Xtract Values
1-2-3 will prompt you for a file name. Choose a name for
the data, type it, and press ENTER. If a file with this
name already exists, 1-2-3 will ask you if it may be
replaced (ie. if 1-2-3 may overwrite it). When you have
chosen the file name, 1-2-3 asks you to specify the range to
be saved. Press the keys Right End Down in sequence,
followed by the ENTER key. Here, Right means the 6 key on
the numeric pad (in cursor mode), and Down means the 2 key.
1-2-3 will then save your data on disk, and you may proceed
to re-size the input area. (Hold down the Alt key and press
M to return to the main menu.)
When you have re-sized the input area, you can retrieve
your data from the disk by exiting to 1-2-3 READY mode and
moving to the top left corner of the input area (as
described above). Then type the following 1-2-3 command
(initial letters only):
/File Combine Copy Entire-file
1-2-3 will present a menu of the files on disk, from which
08/21/84 SREGR - 4
you should select the one you saved (or type its name) and
press the ENTER key. 1-2-3 will then retrieve your data
from the disk into the worksheet. Hold down the Alt key and
press M to return to the main menu.
There is one more option available to you for inputting data
to this worksheet. That is to use the 1-2-3 commands. For
example, you may already have the input data for your regression
analysis stored in a worksheet on disk. If the data occupies two
adjacent columns in the worksheet, with Y values in the left
column, you can use 1-2-3's /File Combine Copy command to copy the
data into the LREGR worksheet. To do this, simply choose Quit at
the main menu, press the Goto key (F5), type the word 'input', and
press ENTER. You will find yourself at the top left corner of the
data input area. Now you can type /fcc and follow the remaining
1-2-3 instructions to copy the data from the worksheet on disk.
When you have finished, hold down the Alt key and press I to
return to the input menu. Note, do not use /Move to move data
around in the worksheet. Moving data into a named range
invalidates the range name along with any formulas that depend on
it. You will probably destroy the worksheet's ability to
calculate properly if you use /Move commands in it.
2.2 Graph Details
The macro that displays the graph in this worksheet
deliberately avoids setting many of the graph parameters. For
example, the labels for the X- and Y- axes are initially set when
the worksheet is loaded, and are not changed. If you wish to use
your own labels, exit to 1-2-3 READY mode and change them. Then
hold down the Alt key and press M to return to the main menu. The
following is a list of the graph options that are set by the
macro. Attempts to reset these options will be futile.
- Graph subtitle
- Graph type and ranges
- Legends for ranges
- Range formats
To get a printed copy of any graph, first display it using
the graph menu. The last-displayed graph is called the current
graph. Then exit to 1-2-3 READY mode. Press the Graph key (F10),
and verify the graph. If you wish, you may use the 1-2-3 /Graph
commands to modify the graph before printing. Then type the
following 1-2-3 command (initial letters only):
/Graph Save
1-2-3 will ask you to specify a file name, and save the graph in
that file. You may then use the PrintGraph program to print the
graph. For instructions on how to use the PrintGraph program,
read the PrintGraph section of the 1-2-3 manual.
2.3 Print Details
The print macros assume you have a printer capable of 80-
08/21/84 SREGR - 5
column printing. Before printing each option, the macros allow
you to align the paper to the top of a new page. Press the ENTER
key when you are ready for printing to begin. At the end, the
macro will feed the paper to the top of a new page so that,
normally, you won't have to keep re-aligning the paper.
The final and intermediate results are printed as one,
because they will comfortably fit onto a 66-line page. The third
option can lead to several pages of output. The macro defines the
top border of the page to be the rows of the worksheet that
identify the columns being printed.
3. EXAMPLES
3.1 Radio or Television?
In the U.K., one must buy a television licence before one can
legally receive television programs. Prior to 1971, a radio
licence was required to receive radio signals, unless one bought a
television licence which covered both. With the rise of tele-
vision in the home, one might expect a fall in the number of radio
licences issued. Indeed, one might expect that for every
television licence bought, one less radio licence would be sold,
since most people owned radios before they bought televisions.
The numbers of radio and television licences issued for the years
1949 to 1968 are shown in the following table:
Radio and Television Licences
Issued in the United Kingdom.
YEAR RADIO TELEVISION
(Millions)
1949 11.7 0.1
1950 12.0 0.3
1951 11.6 0.7
1952 11.2 1.4
1953 10.7 2.2
1954 10.2 3.2
1955 9.4 4.4
1956 8.6 5.6
1957 7.6 6.8
1958 6.6 8.0
1959 5.4 9.2
1960 4.5 10.4
1961 3.9 11.2
1962 3.5 11.9
1963 3.2 12.5
1964 3.0 13.0
1965 2.8 13.3
1966 2.6 13.7
1967 2.5 14.3
1968 2.5 15.0
This table clearly shows the growth of television licences at
08/21/84 SREGR - 6
the expense of radio licences. The data in this table is
contained on your disk in a worksheet named TV. To use it to
illustrate the regression analysis, perform the following steps:
1. Load the SREGR worksheet and when the main menu appears,
Press Q to exit to 1-2-3 READY mode.
2. Press the Goto key (F5), then type the word 'input' and
press the ENTER key.
3. Type the following 1-2-3 command (initial letters only):
/File Combine Copy Entire-file TV
and press the ENTER key. This will bring the saved data
into the input area of your worksheet.
4. Hold down the Alt key and press M. This will return you to
the main menu.
5. Choose the Calc option. When calculations are complete, you
may use the results, graph, and print phases.
The results show a highly significant correlation. The
surprising part of the answer, however, is that the slope of the
regression line is -1.425, and not, as one might expect, -1.
Moreover the standard deviation of the slope - S(b) - is so small
(0.03) that it is inconceivable that the real slope is -1 given
the observations we have. This means the increase in the number
of television licences sold exceeds the decrease in the number of
radio licences by about 40%. Since anyone who bought or rented a
television almost certainly used a radio, one must look for an
explanation.
A possible explanation for the difference is that radio
licences were not treated seriously. It was common for people to
use unlicenced radio receivers, and few were prosecuted. But when
television licences were introduced, the need for a licence was
publicised, and users of unlicenced televisions were prosecuted
more actively. The 40% difference is likely due to people buying
a television licence when before they did not bother to buy a
radio licence. Another likely explanation is that the population
of viewers was growing in total.
The correlation coefficient (r) for this particular
regression is -0.9962, which is very close to -1 (meaning perfect
correlation). If you examine the graph, you will see that the
points do indeed lie very close to a straight line.
A note of warning is in order here. It can be very dangerous
to assume that a cause-and-effect relationship exists between two
factors, even when the fit is this good. One should take the
results of any statistical analysis with a large grain of salt,
unless there is a prior expectation of cause-and effect. In this
particular case, we do have such expectation because owning a
television licence removed the need to buy a radio licence.
Measurements such as these that are taken over a period of time
can be influenced by external factors which change over time.
This can lead to an apparent relationship when in fact none
exists. For example, television sales over some years were
positively correlated with admissions to mental institutions.
This statement could hardly justify an assertion that televisions
08/21/84 SREGR - 7
cause mental illness! Both simply happened to be growing over the
period the figures were observed.
3.2 Example 2 - Fictitious Data
This example leads you through a complete analysis of the
data given in the following table:
Linear Regression
Fictitious Data
X value Y value
1 3
3 5
5 6
7 8
9 12
11 12
Once you have loaded the SREGR worksheet, the following
keystrokes will perform the analysis:
Keystrokes Comments
{ENTER} Press the ENTER key to select the Input choice in
the main menu.
X Choose the XYPairs option in the input menu,
because this is how the source data is arranged.
Note that the macro will place the cell pointer in
the topmost cell of the X value input column.
1 {ENTER} This enters the first X value in the top cell.
The macro will move the cell pointer one column to
the left to receive the first Y value.
3 {ENTER} Enters the first Y value. The macro will now move
the cell pointer to the right and down so you can
enter the second X value.
3 {ENTER} Enter the remaining values.
5 {ENTER}
5 {ENTER}
6 {ENTER}
7 {ENTER}
8 {ENTER}
9 {ENTER}
12 {ENTER}
11 {ENTER}
12 {ENTER}
{Ctrl} {Break} Having entered all the pairs of values, hold down
the Ctrl key and press Break. This stops the
08/21/84 SREGR - 8
macro, and returns you to 1-2-3 READY mode.
{Alt} I Hold down the Alt key, and press I. This returns
you to the input menu.
E Select Est-X to obtain estimates of Y for
specified X values. The macro positions the cell
pointer at the first cell in the X input column
below the values you have already input.
0 {ENTER} Enter the X values for which estimates are
required.
2 {ENTER}
4 {ENTER}
6 {ENTER}
8 {ENTER}
10 {ENTER}
12 {ENTER}
{Ctrl} {Break} Return to the input menu.
{Alt} I
Q Return to main menu.
C Perform the calculations.
At this point, all the results have been obtained, and you
can display them by choosing Results in the main menu. Here's how
(starting at the main menu):
R Takes you to the Results menu from the main menu.
{ENTER} Displays the final results on the 'home' screen.
(Typically, you will already be on the home
screen, and this option will have no visible
effect.) On this screen, you can read the
calculated regression equation, the correlation
coefficient, and the Student's t value for testing
the significance of the regression.
I Displays the intermediate results. Here you can
see the number of (paired) observations, and the
mean of the (paired) X and Y values. The last
line on this screen shows the standard deviation
of the slope, which is used to derive the
Student's t statistic for the significance test of
the regression.
R Displays the raw data. The display starts at the
first screen-full of data, but you are presented
with a menu that allows you to scroll up and down
one screen-full at a time. Note the macro sets up
a window so the column headings are always
displayed above the data columns. You will not
need to scroll for this small set of data.
08/21/84 SREGR - 9
Q Returns you to the results menu.
Q Returns you to the main menu from the results
menu.
You can display the graph provided by this worksheet as
follows:
G Takes you from the main menu to the graphics menu.
Initially, the graphs are set up to display on a
color monitor. If you have a monochrome monitor
(with a graphics adaptor card), you should type
the following three commands:
O Selects the graphics options menu.
B Tells 1-2-3 to plot graphs in monochrome (B&W).
Q Returns you to the graphics menu.
R Displays the graph showing the raw data points,
and the regression line through them. From this
graph, you can see how well (or not) the points
follow the straight line. Press any key to return
to the graphics menu.
Q Returns you to the main menu.
If you wish, you may Quit all menus and set up the graph
title (not subtitle), and the X- and Y-axis legends to suit your
needs. Then use Alt-M to return to the menus and re-display the
graph. Since the graphics macro does not reset these parameters
each time, they will appear on the graph as you have set them.
To print the results, first make sure your printer is turned
on and loaded with paper. Then, from the main menu, type the
following:
P To go to the print menu.
R To print the final and intermediate results.
Press ENTER when you are ready for printing to
begin.
D To print the raw data and Y estimates. Press
ENTER when you are ready for printing to begin.
Q To return to the main menu.
Before starting each section of output, the macros pause for
you to align the paper. After each section of output, they space
to the top of the next page, so you should only have to align the
paper once - before the first section.
08/21/84 SREGR - 10
Menu Tree for SREGR Worksheet
Input - Input X-Y data.
. XYPairs - Input X-Y pairs of values (X first).
. YXPairs - Input Y-X pairs of values (Y first).
. Est-X - Enter X values to obtain Y estimates.
. Range - Use 1-2-3 standard Range Input facility.
. Number - Set maximum number of data pairs (erases data).
. . No - Do not erase or resize data area.
. . Yes - Erase and resize data area.
. Quit - Return to main menu.
Blank - Blank out data input area.
. No - Do not blank out input area; return to main menu.
. Yes - Blank out input area, then return to main menu.
Calc - Calculate all results.
Results - Display results on the screen.
. Final - Show final results (regression equation, etc.).
. Intermed - Show intermediate results (sums of squares, etc.).
. Rawdata - Show raw input data and Y estimates.
. . Down - Scroll down one screen.
. . Up - Scroll up one screen.
. . Quit - Return to results menu.
. Quit - Return to main menu.
Graph - Display graphs.
. Rawdata - Display graph of input data and regression line.
. Options - Select graph options.
. . Color - Display graphs in color.
. . B&W - Display graphs in monochrome.
. . Quit - Return to graph menu.
. View - Re-display the last-seen graph.
. Quit - Return to main menu.
Print - Send results to printer.
. Results - Print final and intermediate results.
. Data - Print raw data and Y estimates.
. Quit - Return to main menu.
Agenda - Return to worksheet agenda.
. No - Do not leave this worksheet; return to main menu.
. Yes - Clear this worksheet and load agenda.
Quit - Exit to 1-2-3 READY mode.
08/21/84 SREGR - 11